--- title: Exploring Object Detection using Icevision w/ FastAI keywords: fastai sidebar: home_sidebar nb_path: "20_subcoco_ivf.ipynb" ---
{% raw %}
{% endraw %} {% raw %}
{% endraw %}

Download a Sample of COCO Data

The full COCO Dataset is huge (~50GB?). For my self education exploring object detection, with the intention of using pretrained model in transfer learning, it is not practical to deal with dataset this big as my first project. Luckily, the kind folks at FastAI have prepared some convenient subsets, the medium size 3GB https://s3.amazonaws.com/fast-ai-coco/coco_sample.tgz seems like a good candidate. The 800KB "http://files.fast.ai/data/examples/coco_tiny.tgz" on the other hand seems way too small, thus may not have enough data for adequate training.

{% raw %}
datadir = Path("workspace")

froot = "coco_sample"
fname = f"{froot}.tgz"
url = f"https://s3.amazonaws.com/fast-ai-coco/{fname}"
json_fname = datadir/froot/'annotations'/'train_sample.json'
img_dir = datadir/froot/'train_sample'

if not os.path.isdir(datadir/froot): 
    fetch_data(url, datadir, fname, chunk_size=1024*1024)
{% endraw %}

If playing with the tiny Coco subset, use these values

froot = "coco_tiny"
fname = f"{froot}.tgz"
url = f"http://files.fast.ai/data/examples/{fname}"
json_fname = datadir/froot/'train.json'
img_dir = datadir/froot/'train'

Check Annotations

Let's load and inspect the annotation file that comes with the coco tiny dataset...

{% raw %}
with open(json_fname, 'r') as json_f:
    train_json = json.load(json_f)
    
train_json['categories'], train_json['images'][0], [a for a in train_json['annotations'] if a['image_id']==train_json['images'][0]['id'] ]
([{'id': 62, 'name': 'chair'},
  {'id': 63, 'name': 'couch'},
  {'id': 72, 'name': 'tv'},
  {'id': 75, 'name': 'remote'},
  {'id': 84, 'name': 'book'},
  {'id': 86, 'name': 'vase'}],
 {'id': 318219, 'file_name': '000000318219.jpg'},
 [{'image_id': 318219,
   'bbox': [505.24, 0.0, 47.86, 309.25],
   'category_id': 72},
  {'image_id': 318219,
   'bbox': [470.68, 0.0, 45.93, 191.86],
   'category_id': 72},
  {'image_id': 318219,
   'bbox': [442.51, 0.0, 43.39, 119.87],
   'category_id': 72}])
{% endraw %}

Digest the Dataset for useful Stats

Do some basic analysis of the data to get numbers like total images, boxes, and average box count per image...

{% raw %}
stats = load_stats(train_json, img_dir=img_dir, force_reload=False) 
print(
    f"Categories {stats.num_cats}, Images {stats.num_imgs}, Boxes {stats.num_bboxs}, avg (w,h) {(stats.avg_width, stats.avg_height)}"
    f"avg cats/img {stats.avg_ncats_per_img:.1f}, avg boxs/img {stats.avg_nboxs_per_img:.1f}, avg boxs/cat {stats.avg_nboxs_per_cat:.1f}.")

print(f"Image means by channel {stats.chn_means}, std.dev by channel {stats.chn_stds}")
  0%|          | 4/21837 [00:00<09:21, 38.87it/s]
Failed to read precomputed stats: No module named 'subcoco_utils'
100%|██████████| 21837/21837 [08:51<00:00, 41.11it/s]
Categories 6, Images 21837, Boxes 87106, avg (w,h) (575.6857626963424, 481.71420066859)avg cats/img 7.0, avg boxs/img 4.0, avg boxs/cat 14517.7.
Image means by channel [115.64436835 103.2992867   91.73613059], std.dev by channel [64.16724017 62.63021182 61.92975836]
{% endraw %} {% raw %}
stats.cat2name, stats.lbl2cat, stats.cat2lbl, stats.lbl2name
({62: 'chair', 63: 'couch', 72: 'tv', 75: 'remote', 84: 'book', 86: 'vase'},
 {1: 62, 2: 63, 3: 72, 4: 75, 5: 84, 6: 86, 0: 0},
 {62: 1, 63: 2, 72: 3, 75: 4, 84: 5, 86: 6, 0: 0},
 {1: 'chair', 2: 'couch', 3: 'tv', 4: 'remote', 5: 'book', 6: 'vase'})
{% endraw %}

Load Data Using Custom Parser

To prevent bounding boxes being too close to margin or too small, especially after augmentation which performs transformations. I would set min_margin_ratio = 0.05, min_width_height_ratio = 0.05.

However, IceVision 2.0 now has autofix which should address these issues, it does take a long time to run though...

{% raw %}
parser = SubCocoParser(stats, min_margin_ratio = 0.05, min_width_height_ratio = 0.05)
train_records, valid_records = parser.parse(autofix=False)
Skipped 5362 out of 21837 images

{% endraw %}

shows images with corresponding labels and boxes

{% raw %}
class_map = ClassMap(list(stats.cat2name.values()))
show_records(train_records[:4], ncols=2, class_map=class_map, show=True)
{% endraw %}

Define transforms - using Albumentations transforms out of the box

{% raw %}
size=384
train_tfms = tfms.A.Adapter([
    *tfms.A.aug_tfms(
        size=size, 
        presize=size+128,
        shift_scale_rotate = tfms.A.ShiftScaleRotate(shift_limit=.025, scale_limit=0.025, rotate_limit=9)
    ), 
    tfms.A.Normalize()
]) # mean=stats.chn_means, std=stats.chn_stds
valid_tfms = tfms.A.Adapter([*tfms.A.resize_and_pad(size), tfms.A.Normalize()])
{% endraw %}

Create both training and validation datasets

{% raw %}
train_ds = Dataset(train_records, train_tfms)
valid_ds = Dataset(valid_records, valid_tfms)
{% endraw %}

Create Model, Training and Validation Dataloaders

{% raw %}
# I ran this model w img 384x384x3 on my Dell XPS15 w GTX-1050 with 4GB VRAM, 16GM RAM, ~20min/epoch.
bs=32
acc_cycs = 4 
backbone_name = "tf_efficientdet_lite0"
model = efficientdet.model(model_name=backbone_name, img_size=size, num_classes=len(stats.cat2name))
train_dl = efficientdet.train_dl(train_ds, batch_size=bs//acc_cycs, num_workers=6, shuffle=True)
valid_dl = efficientdet.valid_dl(valid_ds, batch_size=bs//acc_cycs, num_workers=6, shuffle=False)
{% endraw %}

I have experimented with other models available out of box in IceVision, but efficientdet works the best. You can replace backbone_name, backbone, model, with the following values to test.

backbone_name

  • "resnet_fpn.resnet18"

backbone

  • backbones.resnet_fpn.resnet18(pretrained=True)
  • backbones.resnet_fpn.resnet34(pretrained=True)
  • backbones.resnet_fpn.resnet50(pretrained=True) # Default
  • backbones.resnet_fpn.resnet101(pretrained=True)
  • backbones.resnet_fpn.resnet152(pretrained=True)
  • backbones.resnet_fpn.resnext50_32x4d(pretrained=True)
  • backbones.resnet_fpn.resnext101_32x8d(pretrained=True)
  • backbones.resnet_fpn.wide_resnet50_2(pretrained=True)
  • backbones.resnet_fpn.wide_resnet101_2(pretrained=True)

model

  • faster_rcnn.model(backbone=backbone, num_classes=len(stats.cat2name))

Train using FastAI

{% raw %}
monitor_metric = 'COCOMetric'
metrics = [ COCOMetric(metric_type=COCOMetricType.bbox)]

save_model_fname=f'{backbone_name}-{size}'
callbacks=[
    GradientAccumulation(bs), 
    SaveModelCallback(fname=save_model_fname, monitor=monitor_metric),
    EarlyStoppingCallback(monitor=monitor_metric, min_delta=0.001, patience=5)
]

learn = efficientdet.fastai.learner(dls=[train_dl, valid_dl], model=model, metrics=metrics, cbs=callbacks)
learn.freeze()
learn.lr_find()
SuggestedLRs(lr_min=0.33113112449646, lr_steep=6.309573450380412e-07)
{% endraw %} {% raw %}
min_lr, epochs, freeze_epochs = 0.1, 1, 1 # Just to test, should be 100, 20 to get a good model 
print(f"Running with image size {size} for {freeze_epochs}+{epochs} epochs at min LR {min_lr}")
learn.fine_tune(epochs, min_lr, freeze_epochs=freeze_epochs)
Running with image size 384 for 1+1 epochs at min LR 0.1
epoch train_loss valid_loss COCOMetric time
0 nan nan 0.027079 19:06
Better model found at epoch 0 with COCOMetric value: 0.027078549137555315.
epoch train_loss valid_loss COCOMetric time
0 nan nan 0.027079 22:47
Better model found at epoch 0 with COCOMetric value: 0.027078549137555315.
{% endraw %}

Inference

{% raw %}
infer_ds = Dataset(valid_records[:4], valid_tfms)
infer_dl = efficientdet.infer_dl(infer_ds, batch_size=4, shuffle=True)
samples, preds = efficientdet.predict_dl(model, infer_dl)
imgs = [sample["img"] for sample in samples]
show_preds(
    imgs=imgs[:4],
    preds=preds[:4],
    class_map=class_map,
    denormalize_fn=denormalize_imagenet,
    ncols=1,
    figsize=(36,27)
)

{% endraw %}

As you can see, training after only 2 epochs does not produce a usable model.

Saving Final Model Explicitly

Saving it explicitly after all the epochs.

{% raw %}
final_model_save_path = f"models/{save_model_fname}-final.pth"
torch.save(model.state_dict(), final_model_save_path)
{% endraw %}

Inference w/ A Good Pretrained Model

Load a good performing model saved in some past runs with 100 epochs...

{% raw %}
model_save_path = f"models/{save_model_fname}-0.386.pth"
pretrained_model = efficientdet.model(model_name=backbone_name, num_classes=len(stats.cat2name), img_size=384)
pretrained_model.load_state_dict(torch.load(model_save_path))
<All keys matched successfully>
{% endraw %}

Run Inference with first 4 of the validation image...

{% raw %}
infer_ds = Dataset(valid_records[128:132], valid_tfms)
infer_dl = efficientdet.infer_dl(infer_ds, batch_size=4, shuffle=False)
samples, preds = efficientdet.predict_dl(pretrained_model.cuda(), infer_dl)
imgs = [sample["img"] for sample in samples]
show_preds(
    imgs=imgs[:4],
    preds=preds[:4],
    class_map=class_map,
    denormalize_fn=denormalize_imagenet,
    ncols=1,
    figsize=(36,27)
)

{% endraw %}

Code to Run in Python Standalone Script

{% raw %}
{% endraw %}